Choose your own¶
Here we choose one algorithm to classify and measure the accuracy. We will initially try k-nearest neighbors.
Let us import necessary stuff first..
In [1]:
import sys
from class_vis import prettyPicture
from prep_terrain_data import makeTerrainData
from sklearn import tree
from sklearn.metrics import accuracy_score
import numpy as np
import pylab as pl
features_train, labels_train, features_test, labels_test = makeTerrainData()
Let us first check if Decision tree works. This gives us a skeleton to work on.
In [2]:
def classify(features_train, labels_train, min_samples=2):
### your code goes here--should return a trained decision tree classifer
X = features_train
Y = labels_train
clf = tree.DecisionTreeClassifier(min_samples_split=min_samples)
clf = clf.fit(X,Y)
return clf
# train
clf_2 = classify(features_train, labels_train, 2)
# predict
labels_pred_2 = clf_2.predict(features_test)
# accuracy
acc_min_samples_split_2 = accuracy_score(labels_test, labels_pred_2)
print('Accuracy: {}'.format(acc_min_samples_split_2))
In [3]:
prettyPicture(clf_2, features_test, labels_test)
Let us next try using k-nearest neighbor from sci-kit
In [4]:
from sklearn.neighbors import KNeighborsClassifier
def classify(features_train, labels_train, n_neighbors=3):
### your code goes here--should return a trained decision tree classifer
X = features_train
Y = labels_train
clf = KNeighborsClassifier(n_neighbors)
clf = clf.fit(X,Y)
return clf
# train
clf_k = classify(features_train, labels_train, 3)
# predict
labels_pred_k = clf_k.predict(features_test)
# accuracy
acc_k = accuracy_score(labels_test, labels_pred_k)
print('Accuracy: {}'.format(acc_k))
In [5]:
prettyPicture(clf_k, features_test, labels_test)
Improve accuracy?¶
Let us try increasing neighbors..
In [6]:
# train
n_neighbors = 4
clf_k = classify(features_train, labels_train, n_neighbors)
# predict
labels_pred_k = clf_k.predict(features_test)
# accuracy
acc_k = accuracy_score(labels_test, labels_pred_k)
print('Accuracy: {}'.format(acc_k))
Beyond 4, accurancy decreases and so is below 3. So this looks optimal for now.
Good. 94% accuracy is good.